Supplementary Material: Asynchronous Stochastic Gradient Descent with Delay Compensation
ثبت نشده
چکیده
where Cij = 1 1+λ ( uiujβ lilj √ α ), C ′ ij = 1 (1+λ)α(lilj) , and the model converges to the optimal model, then the MSE of λG(wt) is smaller than the MSE of G(wt) in approximating Hessian H(wt). Proof: For simplicity, we abbreviate E(Y |x,w∗) as E, Gt as G(wt) and Ht as H(wt). First, we calculate the MSE of Gt, λGt to approximate Ht for each element of Gt. We denote the element in the i-th row and j-th column of G(wt) as Gij and H(wt) as Hij(t). The MSE of Gij : E(Gij − EH ij) = E(Gij − EGij) + (EH ij − EGij) = E(Gij) − (EGij) + εt (2)
منابع مشابه
Asynchronous Stochastic Gradient Descent with Delay Compensation
With the fast development of deep learning, people have started to train very big neural networks using massive data. Asynchronous Stochastic Gradient Descent (ASGD) is widely used to fulfill this task, which, however, is known to suffer from the problem of delayed gradient. That is, when a local worker adds the gradient it calculates to the global model, the global model may have been updated ...
متن کاملAccelerating Asynchronous Algorithms for Convex Optimization by Momentum Compensation
Asynchronous algorithms have attracted much attention recently due to the crucial demands on solving large-scale optimization problems. However, the accelerated versions of asynchronous algorithms are rarely studied. In this paper, we propose the “momentum compensation” technique to accelerate asynchronous algorithms for convex problems. Specifically, we first accelerate the plain Asynchronous ...
متن کاملParallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization
Nowadays, asynchronous parallel algorithms have received much attention in the optimization field due to the crucial demands for modern large-scale optimization problems. However, most asynchronous algorithms focus on convex problems. Analysis on nonconvex problems is lacking. For the Asynchronous Stochastic Descent (ASGD) algorithm, the best result from (Lian et al., 2015) can only achieve an ...
متن کاملThe Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory
Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on distributed machine learning, significant work has been dedicated to the convergence properties of this algorithm under the inconsistent and noisy updates arising from...
متن کاملAsynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization
We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on nonconvex optimization. Recent studies have shown that the asynchronous stochastic gradient descent (SGD) based algorithms with variance reduction converge with a linear convergent rate on convex problems. However, there is no work to analyze asy...
متن کامل